Speaker recognizability evaluation of a voicefont-based text-to-speech system

نویسندگان

Masaharu Sakamoto

Takashi Saito

چکیده

We have developed a new text-to-speech system based on the VoiceFont technology. A VoiceFont is a voice dictionary for speech synthesis that holds the acoustic and prosodic characteristics extracted from the voice corpus of a speaker. The text-to-speech system using a VoiceFont is able to synthetically mimic the voice of the donor speaker. In this paper, we evaluated speaker recognizability of the synthetic speech, which means whether the synthetic speech can be recognized as the donor speaker’s voice. We conducted a subjective evaluation for five VoiceFonts and here report on the evaluation results. The results show that our text-to-speech system based on VoiceFonts can retain the acoustic and prosodic characteristics of the donor speaker and the synthetic speech can be recognized as the donor speaker’s voice. Furthermore, we report on how much the spectral characteristics, phoneme duration, and pitch frequency affect speaker recognizability.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A VoiceFont Creation Framework for Generating Personalized Voices

This paper presents a new framework for effectively creating VoiceFonts for speech synthesis. A VoiceFont in this paper represents a voice inventory aimed at generating personalized voices. Creating wellformed voice inventories is a time-consuming and laborious task. This has become a critical issue for speech synthesis systems that make an attempt to synthesize many high quality voice personal...

متن کامل

A method of creating a new speaker²s voicefont in a text-to-speech system

This paper presents a method of creating a new speaker’s voice database (VoiceFont) by which the voice of the donor speaker can be synthesized for mimicking in a text-to-speech system. A VoiceFont creation system, “VoiceFont Builder”, is developed to make the creation process easier and more effective than current systems. The voice feature extraction applied in the system is a simple but power...

متن کامل

A na ve de-lambing method for speaker identification

This paper addresses the issue of close-set text-independent speaker identification from speech samples recorded over telephone. We have known that the speaker identification performance variability can be attributed to many factors. One major factor is the inherent differences in the recognizability of different speakers. In speaker recognition systems such differences are characterized by the...

متن کامل

Evaluating the effects of communication systems on speaker recognizability by human listeners: The Diagnostic Speaker Recognizability Test (DSRT)

The Diagnostic Speaker Recognizability Test (DSRT) is based on the principle that recognition of voices by human listeners presupposes discrimination with respect to various perceived voice traits (PVT’s). With knowledge of the nature of such traits, we can evaluate the impact of speech degradation on speaker recognizability in terms of its effects on the discriminability of the various PVT’s. ...

متن کامل